This notebook performs a basic overview of the data and analyses the
genetic diversity indicators. It uses as input the “clean kobo output”
that was first cleaned by 1.2_cleaning.
Load required libraries:
Load required functions. These custom fuctions are available at: https://github.com/AliciaMstt/GeneticIndicators
Other custom functions:
Custom colors:
Get indicators data from clean kobo output
# Get data:
kobo_clean<-read.csv(file="kobo_output_clean.csv", header=TRUE)
# Extract indicator 1 data from kobo output, show most relevant columns
ind1_data<-get_indicator1_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind1_data[,c(1:3, 12:14)])
# Extract indicator 2 data from kobo output, show most relevant columns
ind2_data<-get_indicator2_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind2_data[,c(1:3, 9:10,13)])
# Extract indicator 3 data from kobo output, show most relevant columns
ind3_data<-get_indicator3_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind3_data[,c(1:3, 9:11)])
# extract metadata, show most relevant columns
metadata<-get_metadata(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(metadata[,c(1:3, 12, 25,26, 64)])
# save processed data
write.csv(ind1_data, "ind1_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(ind2_data, "ind2_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(ind3_data, "ind3_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(metadata, "metadata.csv", row.names = FALSE, fileEncoding = "UTF-8")
The methods used to define populations come from a check box question were one or more of the following categories can be selected: genetic_clusters, geographic_boundaries, eco_biogeo_proxies, adaptive_traits, management_units, other. As a consequence any combination of the former can be possible. Leading to the following results:
##
## adaptive_traits
## 3
## adaptive_traits management_units
## 20
## eco_biogeo_proxies
## 37
## eco_biogeo_proxies adaptive_traits
## 2
## eco_biogeo_proxies management_units
## 5
## eco_biogeo_proxies other
## 3
## genetic_clusters
## 101
## genetic_clusters adaptive_traits
## 4
## genetic_clusters eco_biogeo_proxies
## 20
## genetic_clusters eco_biogeo_proxies adaptive_traits
## 3
## genetic_clusters eco_biogeo_proxies adaptive_traits management_units
## 2
## genetic_clusters eco_biogeo_proxies management_units
## 1
## genetic_clusters geographic_boundaries
## 71
## genetic_clusters geographic_boundaries adaptive_traits
## 4
## genetic_clusters geographic_boundaries eco_biogeo_proxies
## 7
## genetic_clusters geographic_boundaries eco_biogeo_proxies adaptive_traits
## 1
## genetic_clusters geographic_boundaries eco_biogeo_proxies adaptive_traits management_units
## 1
## genetic_clusters geographic_boundaries eco_biogeo_proxies management_units
## 1
## genetic_clusters geographic_boundaries management_units
## 8
## genetic_clusters management_units
## 6
## genetic_clusters other
## 2
## geographic_boundaries
## 254
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries adaptive_traits management_units
## 12
## geographic_boundaries adaptive_traits management_units other
## 1
## geographic_boundaries eco_biogeo_proxies
## 21
## geographic_boundaries eco_biogeo_proxies adaptive_traits
## 3
## geographic_boundaries eco_biogeo_proxies management_units
## 3
## geographic_boundaries eco_biogeo_proxies other
## 2
## geographic_boundaries management_units
## 23
## geographic_boundaries other
## 9
## management_units
## 112
## management_units other
## 2
## other
## 20
It is hard to group the above methods, so we will keep the original groups with n >=19 in the above list, and tag the combinations that appear few times as as “low_freq_combinations”.
Which groups have n>=19?Check n for simplified methods:
##
## adaptive_traits management_units
## 20
## eco_biogeo_proxies
## 37
## genetic_clusters
## 101
## genetic_clusters eco_biogeo_proxies
## 20
## genetic_clusters geographic_boundaries
## 71
## geographic_boundaries
## 254
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 21
## geographic_boundaries management_units
## 23
## low_freq_combinations
## 85
## management_units
## 112
## other
## 20
Another option is to highlight if genetic_cluster or geographic_boundaries were used at all, which are the main drivers. This will look like:
Table of equivalences:
Records by country, including taxa assessed more than once (see below for details on this)
Did countries used kobo or tabular?
Records by taxonomic groups
Some taxa were assessed twice, for example to account for uncertainty
on how to divide populations. This information is stored in variable
multiassessment of the metadata (created by
get_metadata()). An example of taxa with multiple
assessments:
In total these are the number or records (assessment) done for both categories:
##
## multiassessment single_assessment
## 73 721
The above numbers refer to the number or records, if what we want is to know how many taxa were analysed for each category, then:
Number of taxa with multiple submissions:
## [1] 35
Number of taxa with single submissions:
## [1] 721
To explore what kind of taxa countries assessed regardless of if they assessed them once or more, lets create a dataset keeping all single assessed taxa, plus only the first assessment for taxa assessed multiple times.
How many records?
## [1] 756
Of which countries and taxonomic groups are the taxa that were
assessed more than once?
Now check taxa assessed excluding duplicates, i.e. the real number of taxa assessed. This will be used in downstream analyses
Note: The following plots in this section consider only one record of the taxa that were assessed more than once. That is a total of 756 taxa.
Note on alluvial vs Sankey, taken from ggalluvial: An important feature of alluvial plots is the meaningfulness of the vertical axis: No gaps are inserted between the strata, so the total height of the plot reflects the cumulative quantity of the observations. The plots produced by {ggalluvial} conform to the “grammar of graphics” principles of {ggplot2}, and this prevents users from producing “free-floating” visualizations like the Sankey diagrams
Using alluvial:## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"
Using ggsankey
## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"
Using ggsankey option 1
Using ggsankey option 2
Using alluvial:
## [1] "cr" "dd" "en" "lc" "not_assessed"
## [6] "nt" "unknown" "vu"
## [1] "brown2" "brown2" "darkorange" "darkorange" "darkorange"
## [6] "darkgreen"
Using ggsankey
The following plots consider the whole dataset, ie including taxa that were assessed more than once (because they could have been analysed using different methods to define populations)
## [1] "#668cd1" "#668cd1" "#668cd1" "#668cd1" "#668cd1" "#45c097"
Same only country and method:
Sankey just becasue why not:
Population size data may come from different methods for each population within a single taxon. For example, some populations can have Ne estimates, other Nc and others a range. Examples:
Also, for some taxa there may be population size data for some populations, but not all. Therefore indicator 1 would be computed with less populations than the total number of populations. Example (see pop3, 4, 13, 15):
We need to keep the former in mind for interpretation and discussion of how the indicator can change in future assessments if data becomes available for populations currently missing.
How many of the 2084 populations have Ne, Nc or range data?
Ne?
## [1] 128
Nc point?
## [1] 616
Nc range?
## [1] 1116
Has Ne values?
How is Ne data distributed?
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.9 37.9 150.0 28394.3 537.0 3000000.0 1956
Boxplot of Ne values:
Boxplot filtering outliers (Ne)
Indicator 2 is the he proportion of populations within species which
are maintained. This can be estimated based on the
n_extant_populations and n_extint_populations,
as follows:
## [1] 1.0000000 0.5000000 0.2941176 1.0000000 0.3333333 1.0000000
See the distribution of the number of extant populations:
Exclude outliers (>200 populations)
How does the number of populations vary by country? (excluding outliers: >200 pops)
And by method to define populations? (excluding outliers: >200 pops)
Simplified method categories for easier visualization:
Number of populations by taxonomic group:
Taxonomic group and method:
Country and method:
Country and method, but with the US and Sweden in different scale
Number of populations by taxonomic group and range type:
Number of populations by taxonomic group and global IUCN:
One-way ANOVA for the effect of the method to define populations on the number of extant pops, removing the extreme outlier (>1,000 pops)
# subset data without massive outlier
ind2_data_anova<- ind2_data %>%
filter(n_extant_populations<1000)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 37
## genetic_clusters
## 98
## genetic_clusters eco_biogeo_proxies
## 19
## genetic_clusters geographic_boundaries
## 71
## geographic_boundaries
## 251
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 21
## geographic_boundaries management_units
## 23
## low_freq_combinations
## 84
## management_units
## 108
## other
## 14
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = n_extant_populations ~ defined_populations_simplified,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified Residuals
## Sum of Squares 42563.5 1507793.9
## Deg. of Freedom 11 763
##
## Residual standard error: 44.45378
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 42564 3869 1.958 0.0298 *
## Residuals 763 1507794 1976
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same One-way ANOVA for the effect of the method to define populations on the number of extant pops, but removing outliers >200 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(n_extant_populations<200)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 36
## genetic_clusters
## 98
## genetic_clusters eco_biogeo_proxies
## 19
## genetic_clusters geographic_boundaries
## 69
## geographic_boundaries
## 248
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 20
## geographic_boundaries management_units
## 23
## low_freq_combinations
## 84
## management_units
## 108
## other
## 14
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = n_extant_populations ~ defined_populations_simplified,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified Residuals
## Sum of Squares 18929.5 358242.7
## Deg. of Freedom 11 756
##
## Residual standard error: 21.76846
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 18929 1720.9 3.632 4.98e-05 ***
## Residuals 756 358243 473.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the country on the number of extant pops, removing outliers >200 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(n_extant_populations<200)
# summary of n per variable
table(ind2_data_anova$country_assessment)
##
## australia belgium france japan mexico
## 81 81 55 50 83
## south_africa sweden united_states
## 120 114 184
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ country_assessment, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = n_extant_populations ~ country_assessment, data = ind2_data_anova)
##
## Terms:
## country_assessment Residuals
## Sum of Squares 33031.9 344140.2
## Deg. of Freedom 7 760
##
## Residual standard error: 21.27947
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## country_assessment 7 33032 4719 10.42 1.56e-12 ***
## Residuals 760 344140 453
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the taxonomic group on the number of extant pops, removing outliers >200 pops and taxonomic groups with too few data
# summary of n per variable
table(ind2_data$taxonomic_group)
##
## amphibian angiosperm bird bryophyte fish
## 49 222 87 4 57
## fungus gymnosperm invertebrate mammal other
## 1 17 133 134 18
## pteridophytes reptile
## 12 60
# subset data
ind2_data_anova<- ind2_data %>%
filter(n_extant_populations<200) %>%
filter(taxonomic_group %!in% c("fungus", "bryophyte", "other", "pteridophytes"))
# summary of n per variable
table(ind2_data_anova$taxonomic_group)
##
## amphibian angiosperm bird fish gymnosperm invertebrate
## 48 217 86 56 16 128
## mammal reptile
## 129 53
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ taxonomic_group, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = n_extant_populations ~ taxonomic_group, data = ind2_data_anova)
##
## Terms:
## taxonomic_group Residuals
## Sum of Squares 17549.4 343600.3
## Deg. of Freedom 7 725
##
## Residual standard error: 21.76997
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## taxonomic_group 7 17549 2507.1 5.29 6.53e-06 ***
## Residuals 725 343600 473.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?
## Call:
## aov(formula = n_extant_populations ~ defined_populations_simplified *
## country_assessment, data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 18929.47 22866.55
## Deg. of Freedom 11 7
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 25164.77 310211.34
## Deg. of Freedom 43 706
##
## Residual standard error: 20.9617
## 34 out of 96 effects not estimable
## Estimated effects may be unbalanced
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 11 18929 1721 3.916
## country_assessment 7 22867 3267 7.434
## defined_populations_simplified:country_assessment 43 25165 585 1.332
## Residuals 706 310211 439
## Pr(>F)
## defined_populations_simplified 1.57e-05 ***
## country_assessment 1.21e-08 ***
## defined_populations_simplified:country_assessment 0.0793 .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n
# variables with enough n
enough_n<-ind2_data %>%
group_by(country_assessment, defined_populations_simplified) %>%
summarise(n=n()) %>%
filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>%
filter(n_extant_populations<200) %>%
# this gives the country
filter(country_assessment==unique(enough_n$country_assessment)[1] &
#this gives the methods for that country (the last [[1]] is to get the results out of a list)
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]] |
# the same for rest of countries. Notice the use of & for methods within country and | to change to other country
country_assessment==unique(enough_n$country_assessment)[2] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[3] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[4] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[5] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[6] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[7] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[8] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])
# summary of n per variable
ind2_data_anova %>%
group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = n_extant_populations ~ defined_populations_simplified *
## country_assessment, data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 21480.80 19226.60
## Deg. of Freedom 8 5
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 834.45 234646.92
## Deg. of Freedom 3 531
##
## Residual standard error: 21.02133
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 8 21481 2685 6.076
## country_assessment 5 19227 3845 8.702
## defined_populations_simplified:country_assessment 3 834 278 0.629
## Residuals 531 234647 442
## Pr(>F)
## defined_populations_simplified 1.69e-07 ***
## country_assessment 6.07e-08 ***
## defined_populations_simplified:country_assessment 0.596
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
See the distribution of the number of extinct populations:
Exclude outliers (>200 populations)
How does the number of populations vary by country? (excluding outliers: >200 pops)
by method to define populations? (excluding outliers: >200 pops) with simplified method categories for easier visualization:
Number of populations by taxonomic group:
Taxonomic group and method:
Country and method:
Number of populations by taxonomic group and range type:
Number of populations by taxonomic group and global IUCN:
By method
By method. Sweden and US separately because they have too many pops.
By risk status, zooming in to fewer n of pops. Sweden and US separately
because they have too many pops.
We have NA because in some cases the number of extinct populations is unknown, therefore the above operation cannot be computed.
Total records with NA in extant populations:
## [1] 18
Which are?
Total taxa with NA in extinct populations:
## [1] 347
Do taxa with NA for extant also have NA for extinct?
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE
So out of the 794, we have 347 records with NA in n_extinct and 18 records with NA in n_extant. Of them, 18 have NA in both n_extant and n_extinct.
QUESTION: should we manually check that NA are correct in both extinct or extant pops? (the cleaning script only chekcs for 0, not NAs)
So in total there are 347 records where there are NA in either n_extant or n_extinct, which is 43.7% of total number of records. Therefore when estimating indicator 2… QUESTION: What should we do: A) we can’t estimate indicator 2 in those species, or B) we assume n_extinct = NA = 0, and therefore indicator 2 = 1.
By country
By taxonomic group
By method to define pops
By method to define pops and country
By taxonomic group and country
Some taxa were assessed more than once to account for, for example, different ways in how to delimit populations. Create a subset of them, excluding those records with missing data in indicator2 (due to missing data in n_pops).
In total there are 73 multiassessed records, of 35 taxa. Notice that this can include missing data in the number of populations, hence not allowing to estimate indicator 2.
To be able to visualize the missing data, the following plot changes NA to -1. Variation in the number of extant populations by assessment
Same plot, but excluding Bombus terricola’s massive variation:
Now for extinct populations (NA transformed as -1 for visualization purposes):
See you later, Bombus terricola
This is how much the values of indicator2 vary within mutliassessed taxa (taxa names with no shown values mean they have missing data in the number of populations and hence indicator 2 can’t be estimated):
For exploratory purposes, unless otherwise stated differently, the analyses below will use a subset of the data including only taxa assessed a single time, plus the first record of those assessed multiple times.
Remember, for exploratory purposes, unless otherwise stated differently, the analyses below will use a subset of the data including only taxa assessed a single time, plus the first record of those assessed multiple times.
For the taxa that do have data, this is how the values of indicator2 are distributed:
Visualizing by country
Visualizing by taxonomic group:
Same boxplot
Zoom in to invertebrates by country:
Visualizing by IUCN:
Visualizing by range type:
Visualizing by rarity:
By population method
Same, boxplot version:
Facet by country
Facet by country and IUCN
Value of indicator 2 within a single country, by method to define populations, taxonomic group and iucn. Including only 1 assessment per multiassessed taxa.
Same, colouring by country
By global IUCN risk
By population definition method:
Run model (first removing missing data)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 25
## genetic_clusters
## 44
## genetic_clusters eco_biogeo_proxies
## 10
## genetic_clusters geographic_boundaries
## 39
## geographic_boundaries
## 139
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 15
## geographic_boundaries management_units
## 16
## low_freq_combinations
## 63
## management_units
## 38
## other
## 7
##
## Call: glm(formula = ind2_data_wo_missing$indicator2 ~ ind2_data_wo_missing$n_extant_populations +
## ind2_data_wo_missing$defined_populations_simplified + ind2_data_wo_missing$n_extant_populations *
## ind2_data_wo_missing$defined_populations_simplified, family = "quasibinomial")
##
## Coefficients:
## (Intercept)
## 2.438899
## ind2_data_wo_missing$n_extant_populations
## 0.039827
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies
## -1.150927
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters
## 0.348388
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies
## 1.404638
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries
## -0.682750
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries
## -1.139870
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits
## 0.576490
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies
## -0.996594
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units
## -0.731998
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations
## -0.620621
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units
## -1.992630
## ind2_data_wo_missing$defined_populations_simplifiedother
## -2.509699
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies
## -0.035179
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters
## -0.097087
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies
## -0.172294
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries
## -0.045113
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries
## -0.042280
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits
## -0.031073
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies
## -0.042619
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units
## 0.006636
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations
## -0.048541
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units
## -0.033672
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother
## 0.655693
##
## Degrees of Freedom: 444 Total (i.e. Null); 421 Residual
## Null Deviance: 195.2
## Residual Deviance: 165.5 AIC: NA
##
## Call:
## glm(formula = ind2_data_wo_missing$indicator2 ~ ind2_data_wo_missing$n_extant_populations +
## ind2_data_wo_missing$defined_populations_simplified + ind2_data_wo_missing$n_extant_populations *
## ind2_data_wo_missing$defined_populations_simplified, family = "quasibinomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7551 -0.3135 0.3176 0.5652 0.9923
##
## Coefficients:
## Estimate
## (Intercept) 2.438899
## ind2_data_wo_missing$n_extant_populations 0.039827
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies -1.150927
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.348388
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 1.404638
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries -0.682750
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries -1.139870
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.576490
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies -0.996594
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units -0.731998
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations -0.620621
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units -1.992630
## ind2_data_wo_missing$defined_populations_simplifiedother -2.509699
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies -0.035179
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters -0.097087
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies -0.172294
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries -0.045113
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries -0.042280
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits -0.031073
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies -0.042619
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.006636
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations -0.048541
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units -0.033672
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother 0.655693
## Std. Error
## (Intercept) 0.768536
## ind2_data_wo_missing$n_extant_populations 0.120075
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies 0.848557
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.925289
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 1.497357
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries 0.823093
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries 0.782282
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.979936
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies 0.884590
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.989659
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations 0.806642
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units 0.815451
## ind2_data_wo_missing$defined_populations_simplifiedother 1.222196
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies 0.120242
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.169784
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 0.132777
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries 0.120112
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries 0.120168
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.125158
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies 0.120143
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.169104
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations 0.120281
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units 0.123390
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother 0.604276
## t value
## (Intercept) 3.173
## ind2_data_wo_missing$n_extant_populations 0.332
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies -1.356
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.377
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 0.938
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries -0.829
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries -1.457
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.588
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies -1.127
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units -0.740
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations -0.769
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units -2.444
## ind2_data_wo_missing$defined_populations_simplifiedother -2.053
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies -0.293
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters -0.572
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies -1.298
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries -0.376
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries -0.352
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits -0.248
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies -0.355
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.039
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations -0.404
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units -0.273
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother 1.085
## Pr(>|t|)
## (Intercept) 0.00162
## ind2_data_wo_missing$n_extant_populations 0.74029
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies 0.17572
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.70672
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 0.34874
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries 0.40730
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries 0.14583
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.55665
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies 0.26055
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.45993
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations 0.44209
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units 0.01495
## ind2_data_wo_missing$defined_populations_simplifiedother 0.04065
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies 0.77000
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters 0.56775
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies 0.19513
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries 0.70741
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries 0.72514
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits 0.80405
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies 0.72297
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units 0.96872
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations 0.68674
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units 0.78507
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother 0.27850
##
## (Intercept) **
## ind2_data_wo_missing$n_extant_populations
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units *
## ind2_data_wo_missing$defined_populations_simplifiedother *
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.3948779)
##
## Null deviance: 195.17 on 444 degrees of freedom
## Residual deviance: 165.50 on 421 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
Put side by side the plots of number of populations and indicator 2
range (excluding >200 pops outlieres):
Add the scatter plot of indicator2 and extant pops as a third pannel
One-way ANOVA for the effect of the method to define populations on indicator 2, removing the extreme outlier (>1,000 pops)
# subset data without massive outlier
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 25
## genetic_clusters
## 44
## genetic_clusters eco_biogeo_proxies
## 10
## genetic_clusters geographic_boundaries
## 39
## geographic_boundaries
## 141
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 15
## geographic_boundaries management_units
## 16
## low_freq_combinations
## 63
## management_units
## 38
## other
## 7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified, data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified Residuals
## Sum of Squares 3.296467 24.662583
## Deg. of Freedom 11 435
##
## Residual standard error: 0.2381084
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 3.296 0.2997 5.286 7.34e-08 ***
## Residuals 435 24.663 0.0567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same One-way ANOVA for the effect of the method to define populations on indicator 2, but removing outliers >200 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(indicator2<200)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 25
## genetic_clusters
## 44
## genetic_clusters eco_biogeo_proxies
## 10
## genetic_clusters geographic_boundaries
## 39
## geographic_boundaries
## 141
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 15
## geographic_boundaries management_units
## 16
## low_freq_combinations
## 63
## management_units
## 38
## other
## 7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified, data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified Residuals
## Sum of Squares 3.296467 24.662583
## Deg. of Freedom 11 435
##
## Residual standard error: 0.2381084
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 3.296 0.2997 5.286 7.34e-08 ***
## Residuals 435 24.663 0.0567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p
adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.3137825”,“2”:“-0.53362470”,“3”:“-0.09394021”,“4”:“2.282910e-04”,“rn”:“management_units-adaptive_traits
management_units”},{“1”:“-0.1528869”,“2”:“-0.28799841”,“3”:“-0.01777543”,“4”:“1.204372e-02”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.3142083”,“2”:“-0.48748139”,“3”:“-0.14093522”,“4”:“3.442524e-07”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.2125950”,“2”:“-0.39094113”,“3”:“-0.03424896”,“4”:“5.807306e-03”,“rn”:“management_units-genetic_clusters
geographic_boundaries”},{“1”:“0.1762672”,“2”:“0.01895216”,“3”:“0.33358220”,“4”:“1.369938e-02”,“rn”:“geographic_boundaries
adaptive_traits-geographic_boundaries”},{“1”:“-0.1613214”,“2”:“-0.30433174”,“3”:“-0.01831103”,“4”:“1.254869e-02”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.3375886”,“2”:“-0.52868138”,“3”:“-0.14649575”,“4”:“8.116474e-07”,“rn”:“management_units-geographic_boundaries
adaptive_traits”},{“1”:“-0.2492383”,“2”:“-0.48241624”,“3”:“-0.01606041”,“4”:“2.442980e-02”,“rn”:“management_units-geographic_boundaries
management_units”},{“1”:“-0.2270969”,“2”:“-0.38780613”,“3”:“-0.06638758”,“4”:“2.822933e-04”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
One-way ANOVA for the effect of the country on indicator 2, removing outliers >200 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(indicator2<200)
# summary of n per variable
table(ind2_data_anova$country_assessment)
##
## australia belgium france japan mexico
## 26 20 27 50 23
## south_africa sweden united_states
## 90 72 139
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ country_assessment, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ country_assessment, data = ind2_data_anova)
##
## Terms:
## country_assessment Residuals
## Sum of Squares 8.116879 19.842171
## Deg. of Freedom 7 439
##
## Residual standard error: 0.2125995
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## country_assessment 7 8.117 1.1596 25.66 <2e-16 ***
## Residuals 439 19.842 0.0452
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p
adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.4704909”,“2”:“-0.66307208”,“3”:“-0.27790972”,“4”:“0.000000e+00”,“rn”:“belgium-australia”},{“1”:“-0.2576349”,“2”:“-0.40578324”,“3”:“-0.10948661”,“4”:“5.169073e-06”,“rn”:“sweden-australia”},{“1”:“0.4095050”,“2”:“0.21848066”,“3”:“0.60052928”,“4”:“5.101817e-09”,“rn”:“france-belgium”},{“1”:“0.5158197”,“2”:“0.34450853”,“3”:“0.68713082”,“4”:“0.000000e+00”,“rn”:“japan-belgium”},{“1”:“0.4969288”,“2”:“0.29896220”,“3”:“0.69489539”,“4”:“0.000000e+00”,“rn”:“mexico-belgium”},{“1”:“0.5230411”,“2”:“0.36297601”,“3”:“0.68310624”,“4”:“0.000000e+00”,“rn”:“south_africa-belgium”},{“1”:“0.2128560”,“2”:“0.04919344”,“3”:“0.37651850”,“4”:“2.203044e-03”,“rn”:“sweden-belgium”},{“1”:“0.3867284”,“2”:“0.23187787”,“3”:“0.54157897”,“4”:“0.000000e+00”,“rn”:“united_states-belgium”},{“1”:“-0.1966490”,“2”:“-0.34276778”,“3”:“-0.05053021”,“4”:“1.273976e-03”,“rn”:“sweden-france”},{“1”:“-0.3029637”,“2”:“-0.42216068”,“3”:“-0.18376672”,“4”:“0.000000e+00”,“rn”:“sweden-japan”},{“1”:“-0.1290913”,“2”:“-0.23586761”,“3”:“-0.02231490”,“4”:“6.291789e-03”,“rn”:“united_states-japan”},{“1”:“-0.2840728”,“2”:“-0.43915726”,“3”:“-0.12898838”,“4”:“1.178200e-06”,“rn”:“sweden-mexico”},{“1”:“-0.3101852”,“2”:“-0.41256314”,“3”:“-0.20780716”,“4”:“0.000000e+00”,“rn”:“sweden-south_africa”},{“1”:“-0.1363127”,“2”:“-0.22391706”,“3”:“-0.04870835”,“4”:“7.872918e-05”,“rn”:“united_states-south_africa”},{“1”:“0.1738724”,“2”:“0.07985593”,“3”:“0.26788897”,“4”:“8.825632e-07”,“rn”:“united_states-sweden”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
One-way ANOVA for the effect of the taxonomic group on indicator 2, removing outliers >200 pops and taxonomic groups with too few data
# summary of n per variable
table(ind2_data$taxonomic_group)
##
## amphibian angiosperm bird bryophyte fish
## 49 222 87 4 57
## fungus gymnosperm invertebrate mammal other
## 1 17 133 134 18
## pteridophytes reptile
## 12 60
# subset data
ind2_data_anova<- ind2_data %>%
filter(indicator2<200) %>%
filter(taxonomic_group %!in% c("fungus", "bryophyte", "other", "pteridophytes"))
# summary of n per variable
table(ind2_data_anova$taxonomic_group)
##
## amphibian angiosperm bird fish gymnosperm invertebrate
## 36 132 43 40 9 69
## mammal reptile
## 64 31
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ taxonomic_group, data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ taxonomic_group, data = ind2_data_anova)
##
## Terms:
## taxonomic_group Residuals
## Sum of Squares 3.43812 23.70842
## Deg. of Freedom 7 416
##
## Residual standard error: 0.2387287
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## taxonomic_group 7 3.438 0.4912 8.618 6.96e-10 ***
## Residuals 416 23.708 0.0570
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p
adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.2102735”,“2”:“-0.35979862”,“3”:“-0.0607483793”,“4”:“5.984864e-04”,“rn”:“invertebrate-amphibian”},{“1”:“-0.1908065”,“2”:“-0.29884585”,“3”:“-0.0827672144”,“4”:“3.441991e-06”,“rn”:“invertebrate-angiosperm”},{“1”:“-0.2025492”,“2”:“-0.34385030”,“3”:“-0.0612480996”,“4”:“4.208169e-04”,“rn”:“invertebrate-bird”},{“1”:“-0.1449407”,“2”:“-0.28946938”,“3”:“-0.0004120327”,“4”:“4.876394e-02”,“rn”:“invertebrate-fish”},{“1”:“-0.3326302”,“2”:“-0.59037907”,“3”:“-0.0748812894”,“4”:“2.490181e-03”,“rn”:“invertebrate-gymnosperm”},{“1”:“0.2887679”,“2”:“0.16255422”,“3”:“0.4149816709”,“4”:“3.237879e-10”,“rn”:“mammal-invertebrate”},{“1”:“0.2412486”,“2”:“0.08399883”,“3”:“0.3984982978”,“4”:“1.079755e-04”,“rn”:“reptile-invertebrate”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“country_assessment”],“name”:[1],“type”:[“chr”],“align”:[“left”]},{“label”:[“defined_populations_simplified”],“name”:[2],“type”:[“chr”],“align”:[“left”]},{“label”:[“n”],“name”:[3],“type”:[“int”],“align”:[“right”]}],“data”:[{“1”:“australia”,“2”:“genetic_clusters”,“3”:“5”},{“1”:“australia”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“2”},{“1”:“australia”,“2”:“geographic_boundaries”,“3”:“11”},{“1”:“australia”,“2”:“geographic_boundaries
management_units”,“3”:“5”},{“1”:“australia”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“australia”,“2”:“management_units”,“3”:“1”},{“1”:“belgium”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“belgium”,“2”:“management_units”,“3”:“18”},{“1”:“france”,“2”:“genetic_clusters”,“3”:“1”},{“1”:“france”,“2”:“genetic_clusters
eco_biogeo_proxies”,“3”:“2”},{“1”:“france”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“1”},{“1”:“france”,“2”:“geographic_boundaries”,“3”:“3”},{“1”:“france”,“2”:“geographic_boundaries
eco_biogeo_proxies”,“3”:“3”},{“1”:“france”,“2”:“low_freq_combinations”,“3”:“15”},{“1”:“france”,“2”:“management_units”,“3”:“2”},{“1”:“japan”,“2”:“adaptive_traits
management_units”,“3”:“19”},{“1”:“japan”,“2”:“geographic_boundaries”,“3”:“1”},{“1”:“japan”,“2”:“geographic_boundaries
adaptive_traits”,“3”:“18”},{“1”:“japan”,“2”:“low_freq_combinations”,“3”:“12”},{“1”:“mexico”,“2”:“genetic_clusters”,“3”:“4”},{“1”:“mexico”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“3”},{“1”:“mexico”,“2”:“geographic_boundaries”,“3”:“4”},{“1”:“mexico”,“2”:“geographic_boundaries
adaptive_traits”,“3”:“9”},{“1”:“mexico”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“mexico”,“2”:“other”,“3”:“1”},{“1”:“south_africa”,“2”:“eco_biogeo_proxies”,“3”:“3”},{“1”:“south_africa”,“2”:“genetic_clusters”,“3”:“25”},{“1”:“south_africa”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“18”},{“1”:“south_africa”,“2”:“geographic_boundaries”,“3”:“30”},{“1”:“south_africa”,“2”:“geographic_boundaries
management_units”,“3”:“2”},{“1”:“south_africa”,“2”:“low_freq_combinations”,“3”:“5”},{“1”:“south_africa”,“2”:“management_units”,“3”:“6”},{“1”:“south_africa”,“2”:“other”,“3”:“1”},{“1”:“sweden”,“2”:“eco_biogeo_proxies”,“3”:“2”},{“1”:“sweden”,“2”:“genetic_clusters”,“3”:“3”},{“1”:“sweden”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“8”},{“1”:“sweden”,“2”:“geographic_boundaries”,“3”:“46”},{“1”:“sweden”,“2”:“geographic_boundaries
adaptive_traits”,“3”:“3”},{“1”:“sweden”,“2”:“geographic_boundaries
management_units”,“3”:“3”},{“1”:“sweden”,“2”:“low_freq_combinations”,“3”:“6”},{“1”:“sweden”,“2”:“management_units”,“3”:“1”},{“1”:“united_states”,“2”:“eco_biogeo_proxies”,“3”:“20”},{“1”:“united_states”,“2”:“genetic_clusters”,“3”:“6”},{“1”:“united_states”,“2”:“genetic_clusters
eco_biogeo_proxies”,“3”:“8”},{“1”:“united_states”,“2”:“genetic_clusters
geographic_boundaries”,“3”:“7”},{“1”:“united_states”,“2”:“geographic_boundaries”,“3”:“46”},{“1”:“united_states”,“2”:“geographic_boundaries
eco_biogeo_proxies”,“3”:“12”},{“1”:“united_states”,“2”:“geographic_boundaries
management_units”,“3”:“6”},{“1”:“united_states”,“2”:“low_freq_combinations”,“3”:“19”},{“1”:“united_states”,“2”:“management_units”,“3”:“10”},{“1”:“united_states”,“2”:“other”,“3”:“5”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified * country_assessment,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 3.296467 5.315673
## Deg. of Freedom 11 7
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 2.036565 17.310345
## Deg. of Freedom 32 396
##
## Residual standard error: 0.2090765
## 45 out of 96 effects not estimable
## Estimated effects may be unbalanced
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 11 3.296 0.2997 6.856
## country_assessment 7 5.316 0.7594 17.372
## defined_populations_simplified:country_assessment 32 2.037 0.0636 1.456
## Residuals 396 17.310 0.0437
## Pr(>F)
## defined_populations_simplified 1.33e-10 ***
## country_assessment < 2e-16 ***
## defined_populations_simplified:country_assessment 0.0554 .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p
adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.3137825”,“2”:“-0.50692434”,“3”:“-0.12064057”,“4”:“1.007108e-05”,“rn”:“management_units-adaptive_traits
management_units”},{“1”:“-0.1889437”,“2”:“-0.36596116”,“3”:“-0.01192624”,“4”:“2.486190e-02”,“rn”:“management_units-eco_biogeo_proxies”},{“1”:“-0.1528869”,“2”:“-0.27158880”,“3”:“-0.03418504”,“4”:“1.683367e-03”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.3142083”,“2”:“-0.46643697”,“3”:“-0.16197965”,“4”:“2.759523e-09”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.2689522”,“2”:“-0.51325948”,“3”:“-0.02464487”,“4”:“1.716874e-02”,“rn”:“management_units-genetic_clusters
eco_biogeo_proxies”},{“1”:“-0.2125950”,“2”:“-0.36928057”,“3”:“-0.05590951”,“4”:“6.482061e-04”,“rn”:“management_units-genetic_clusters
geographic_boundaries”},{“1”:“0.1762672”,“2”:“0.03805844”,“3”:“0.31447592”,“4”:“1.994144e-03”,“rn”:“geographic_boundaries
adaptive_traits-geographic_boundaries”},{“1”:“-0.1613214”,“2”:“-0.28696279”,“3”:“-0.03567998”,“4”:“1.776752e-03”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.3375886”,“2”:“-0.50547270”,“3”:“-0.16970443”,“4”:“8.116088e-09”,“rn”:“management_units-geographic_boundaries
adaptive_traits”},{“1”:“-0.2492383”,“2”:“-0.45409623”,“3”:“-0.04438042”,“4”:“4.284597e-03”,“rn”:“management_units-geographic_boundaries
management_units”},{“1”:“-0.2270969”,“2”:“-0.36828761”,“3”:“-0.08590611”,“4”:“1.320299e-05”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n
# variables with enough n
enough_n<-ind2_data %>%
group_by(country_assessment, defined_populations_simplified) %>%
summarise(n=n()) %>%
filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>%
filter(indicator2<200) %>%
# this gives the country
filter(country_assessment==unique(enough_n$country_assessment)[1] &
#this gives the methods for that country (the last [[1]] is to get the results out of a list)
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]] |
# the same for rest of countries. Notice the use of & for methods within country and | to change to other country
country_assessment==unique(enough_n$country_assessment)[2] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[3] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[4] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[5] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[6] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[7] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[8] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])
# summary of n per variable
ind2_data_anova %>%
group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified * country_assessment,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 4.245618 2.775843
## Deg. of Freedom 8 5
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 0.489659 12.931021
## Deg. of Freedom 3 300
##
## Residual standard error: 0.2076136
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 8 4.246 0.5307 12.312
## country_assessment 5 2.776 0.5552 12.880
## defined_populations_simplified:country_assessment 3 0.490 0.1632 3.787
## Residuals 300 12.931 0.0431
## Pr(>F)
## defined_populations_simplified 3.00e-15 ***
## country_assessment 2.34e-11 ***
## defined_populations_simplified:country_assessment 0.0108 *
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p
adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.1596277”,“2”:“-0.31850793”,“3”:“-0.000747464”,“4”:“4.790991e-02”,“rn”:“geographic_boundaries-adaptive_traits
management_units”},{“1”:“-0.5129239”,“2”:“-0.72629623”,“3”:“-0.299551590”,“4”:“2.533862e-11”,“rn”:“management_units-adaptive_traits
management_units”},{“1”:“-0.4225076”,“2”:“-0.63326893”,“3”:“-0.211746370”,“4”:“4.663398e-08”,“rn”:“management_units-eco_biogeo_proxies”},{“1”:“-0.1589618”,“2”:“-0.28334613”,“3”:“-0.034577379”,“4”:“2.626670e-03”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.5122580”,“2”:“-0.70135131”,“3”:“-0.323164626”,“4”:“1.064038e-12”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.4584124”,“2”:“-0.65732106”,“3”:“-0.259503748”,“4”:“1.749125e-10”,“rn”:“management_units-genetic_clusters
geographic_boundaries”},{“1”:“0.1798768”,“2”:“0.01717028”,“3”:“0.342583295”,“4”:“1.802999e-02”,“rn”:“geographic_boundaries
adaptive_traits-geographic_boundaries”},{“1”:“-0.3532962”,“2”:“-0.51600272”,“3”:“-0.190589705”,“4”:“2.232140e-09”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.5331730”,“2”:“-0.74940951”,“3”:“-0.316936491”,“4”:“7.973733e-12”,“rn”:“management_units-geographic_boundaries
adaptive_traits”},{“1”:“-0.3618975”,“2”:“-0.60365722”,“3”:“-0.120137689”,“4”:“1.513420e-04”,“rn”:“management_units-geographic_boundaries
eco_biogeo_proxies”},{“1”:“-0.4056763”,“2”:“-0.59476968”,“3”:“-0.216582993”,“4”:“3.626228e-09”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
One-way ANOVA for the effect of the method to define populations on indicator 2, removing the extreme outlier (>1,000 pops)
# subset data without massive outlier
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 25
## genetic_clusters
## 44
## genetic_clusters eco_biogeo_proxies
## 10
## genetic_clusters geographic_boundaries
## 39
## geographic_boundaries
## 141
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 15
## geographic_boundaries management_units
## 16
## low_freq_combinations
## 63
## management_units
## 38
## other
## 7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 3.296 0.2997 5.286 7.34e-08 ***
## Residuals 435 24.663 0.0567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Same One-way ANOVA for the effect of the method to define populations on indicator 2, but removing outliers >1000 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000)
# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
##
## adaptive_traits management_units
## 19
## eco_biogeo_proxies
## 25
## genetic_clusters
## 44
## genetic_clusters eco_biogeo_proxies
## 10
## genetic_clusters geographic_boundaries
## 39
## geographic_boundaries
## 141
## geographic_boundaries adaptive_traits
## 30
## geographic_boundaries eco_biogeo_proxies
## 15
## geographic_boundaries management_units
## 16
## low_freq_combinations
## 63
## management_units
## 38
## other
## 7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## defined_populations_simplified 11 3.296 0.2997 5.286 7.34e-08 ***
## Residuals 435 24.663 0.0567
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the country on indicator 2, removing outliers >1000 pops
# subset data without outliers
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000)
# summary of n per variable
table(ind2_data_anova$country_assessment)
##
## australia belgium france japan mexico
## 26 20 27 50 23
## south_africa sweden united_states
## 90 72 139
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ country_assessment, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## country_assessment 7 8.117 1.1596 25.66 <2e-16 ***
## Residuals 439 19.842 0.0452
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the taxonomic group on indicator 2, removing outliers >1000 pops and taxonomic groups with too few data
# summary of n per variable
table(ind2_data$taxonomic_group)
##
## amphibian angiosperm bird bryophyte fish
## 49 222 87 4 57
## fungus gymnosperm invertebrate mammal other
## 1 17 133 134 18
## pteridophytes reptile
## 12 60
# subset data
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000) %>%
filter(taxonomic_group %!in% c("fungus", "bryophyte", "other"))
# summary of n per variable
table(ind2_data_anova$taxonomic_group)
##
## amphibian angiosperm bird fish gymnosperm
## 36 132 43 40 9
## invertebrate mammal pteridophytes reptile
## 69 64 8 31
# One way ANOVA
res.anova.extant<-aov(indicator2 ~ taxonomic_group, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## taxonomic_group 8 3.438 0.4298 7.528 2.06e-09 ***
## Residuals 423 24.149 0.0571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the global IUCN on indicator 2, removing outliers >1000 pops and taxonomic groups with too few data
# summary of n per variable
table(ind2_data$global_IUCN)
##
## cr dd en lc not_assessed nt
## 54 13 83 224 258 70
## unknown vu
## 5 87
# subset data
ind2_data_anova<- ind2_data %>%
filter(indicator2<1000) %>%
filter(global_IUCN %!in% c("dd", "unknown"))
# summary of n per variable
table(ind2_data_anova$global_IUCN)
##
## cr en lc not_assessed nt vu
## 32 51 103 154 40 55
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ global_IUCN, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## global_IUCN 5 0.092 0.01834 0.29 0.918
## Residuals 429 27.099 0.06317
Since there significant differences, check which pairs are actually
different (Tukey Honest Significant Differences):
One-way ANOVA for the effect of the species range on indicator 2
# summary of n per variable
table(ind2_data$species_range)
##
## restricted unknown wide_ranging
## 439 16 339
# subset data
ind2_data_anova<- ind2_data %>%
filter(species_range != "unknown")
# summary of n per variable
table(ind2_data_anova$species_range)
##
## restricted wide_ranging
## 439 339
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ species_range, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## species_range 1 0.205 0.20508 3.267 0.0714 .
## Residuals 439 27.558 0.06278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 337 observations deleted due to missingness
One-way ANOVA for the effect of the rarity on indicator 2
# summary of n per variable
table(ind2_data_firstmulti$rarity)
##
## not_rare rare_natural rare_recent
## 305 308 143
# subset data
ind2_data_anova<- ind2_data_firstmulti
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ species_range, data=ind2_data_anova)
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value Pr(>F)
## species_range 2 0.493 0.24664 3.918 0.0206 *
## Residuals 417 26.251 0.06295
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 336 observations deleted due to missingness
Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified * country_assessment,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 3.296467 5.315673
## Deg. of Freedom 11 7
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 2.036565 17.310345
## Deg. of Freedom 32 396
##
## Residual standard error: 0.2090765
## 45 out of 96 effects not estimable
## Estimated effects may be unbalanced
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 11 3.296 0.2997 6.856
## country_assessment 7 5.316 0.7594 17.372
## defined_populations_simplified:country_assessment 32 2.037 0.0636 1.456
## Residuals 396 17.310 0.0437
## Pr(>F)
## defined_populations_simplified 1.33e-10 ***
## country_assessment < 2e-16 ***
## defined_populations_simplified:country_assessment 0.0554 .
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n
# variables with enough n
enough_n<-ind2_data %>%
group_by(country_assessment, defined_populations_simplified) %>%
summarise(n=n()) %>%
filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>%
filter(indicator2<200) %>%
# this gives the country
filter(country_assessment==unique(enough_n$country_assessment)[1] &
#this gives the methods for that country (the last [[1]] is to get the results out of a list)
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]] |
# the same for rest of countries. Notice the use of & for methods within country and | to change to other country
country_assessment==unique(enough_n$country_assessment)[2] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[3] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[4] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[5] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[6] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[7] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] |
country_assessment==unique(enough_n$country_assessment)[8] &
defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])
# summary of n per variable
ind2_data_anova %>%
group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
## aov(formula = indicator2 ~ defined_populations_simplified * country_assessment,
## data = ind2_data_anova)
##
## Terms:
## defined_populations_simplified country_assessment
## Sum of Squares 4.245618 2.775843
## Deg. of Freedom 8 5
## defined_populations_simplified:country_assessment Residuals
## Sum of Squares 0.489659 12.931021
## Deg. of Freedom 3 300
##
## Residual standard error: 0.2076136
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
## Df Sum Sq Mean Sq F value
## defined_populations_simplified 8 4.246 0.5307 12.312
## country_assessment 5 2.776 0.5552 12.880
## defined_populations_simplified:country_assessment 3 0.490 0.1632 3.787
## Residuals 300 12.931 0.0431
## Pr(>F)
## defined_populations_simplified 3.00e-15 ***
## country_assessment 2.34e-11 ***
## defined_populations_simplified:country_assessment 0.0108 *
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
Indicator 3 refers to the number (count) of taxa by country in which
genetic monitoring is occurring. This is stored in the variable
temp_gen_monitoring as a “yes/no” answer for each taxon, so
to estimate the indicator, we only need to count how many said “yes”,
keeping only one of the records when the taxon was multiassessed:
Plot indicator 3 by country:
Relatively few taxa have genetic monitoring, but many have some sort of genetic study. Let’s check that, but first subset the ind3_data keeping only taxa assessed a single time, plust the first record of those assessed multiple times.
Sankey plot of genetic studies
Similar, but alluvial to show data colloring the flow by country
## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"
Plots by country, see forloop above.
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cowplot_1.1.1 viridis_0.6.3 viridisLite_0.4.0 alluvial_0.1-2
## [5] ggsankey_0.0.99999 ggplot2_3.4.1 stringr_1.4.0 utile.tools_0.2.7
## [9] dplyr_1.0.9 tidyr_1.2.0
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 pillar_1.7.0 bslib_0.3.1 compiler_4.2.1
## [5] jquerylib_0.1.4 tools_4.2.1 digest_0.6.29 gtable_0.3.0
## [9] jsonlite_1.8.0 evaluate_0.15 lifecycle_1.0.3 tibble_3.1.7
## [13] pkgconfig_2.0.3 rlang_1.0.6 cli_3.6.0 DBI_1.1.3
## [17] rstudioapi_0.13 yaml_2.3.5 xfun_0.31 fastmap_1.1.0
## [21] gridExtra_2.3 withr_2.5.0 knitr_1.39 generics_0.1.3
## [25] vctrs_0.5.2 sass_0.4.1 grid_4.2.1 tidyselect_1.1.2
## [29] glue_1.6.2 R6_2.5.1 fansi_1.0.3 rmarkdown_2.14
## [33] farver_2.1.1 purrr_0.3.4 magrittr_2.0.3 scales_1.2.0
## [37] ellipsis_0.3.2 htmltools_0.5.5 assertthat_0.2.1 colorspace_2.0-3
## [41] labeling_0.4.2 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0
## [45] crayon_1.5.1